Shadows: a new means of representing documents

نویسندگان

  • Matheus Silva Mota
  • REGINA MACHADO
  • Matheus Silva
  • Claudia Maria Bauzer Medeiros
  • André Santanchè
  • Angelo Roncalli Alencar Brayner
چکیده

Document production tools are present everywhere, resulting in an exponential growth of increasingly complex, distributed and heterogeneous documents. This hampers document exchange, as well as their annotation and retrieval. While information retrieval mechanisms concentrate on textual features (corpus analysis), annotation approaches either target specific formats or require that a document follows interoperable standards – defined via schemas. This work presents our effort to handle these problems, providing a more flexible solution. Rather than trying to modify or convert the document itself, or to target only textual characteristics, the strategy described in this work is based on an intermediate descriptor – the document shadow. A shadow represents domain-relevant aspects and elements of both structure and content of a given document. Shadows are not restricted to the description of textual features, but also concern other elements, such as multimedia artifacts. Furthermore, shadows can be stored in a database, thereby supporting queries on document structure and content, regardless document formats.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier

With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...

متن کامل

Shadow-driven Document Representation A summarization-based strategy to represent non-interoperable documents

Document production tools are present everywhere, resulting in an exponential growth of increasingly complex, distributed and heterogeneous documents. This hampers document exchange, as well as their annotation, indexing and retrieval. Existing approaches to these tasks either concentrate on specific formats or require representing document’s content using interoperable standards or schema. Thi...

متن کامل

No fire without smoke : smoke rendering and light interaction for real- time computer graphics

Realism in computer graphics depends upon digitally representing what we see in the world with careful attention to detail, which usually requires a high degree of complexity in modelling the scene. With some computer graphics applications developers have to limit the complexity of the scene to allow the application to run in real-time on modern consumer grade graphics hardware. This trade-off ...

متن کامل

Comparison of Aerobic Sporadic Bacilli Structure with Electron Microscopy

I. The existence of differences and similarities in the sur- 4. face features not only of different organisms or groups but also within given species has been demonstrated by a variety of techniques. 2. The different reactions of various organisms to the Gram stain might well be taken as one piece of evidence, the use of the electron microscop and associated prepa­rative techniques (includi...

متن کامل

Removing car shadows in video images using entropy and Euclidean distance features

Detecting car motion in video frames is one of the key subjects in computer vision society. In recent years, different approaches have been proposed to address this issue. One of the main challenges of developed image processing systems for car detection is their shadows. Car shadows change the appearance of them in a way that they might seem stitched to other neighboring cars. This study aims ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012